Adversarial examples for extreme multilabel text classification

نویسندگان

چکیده

Abstract Extreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space extremely large, (ii) each data point may have multiple positive labels, and (iii) follows strongly imbalanced distribution. With applications recommendation systems automatic tagging of web-scale documents, research on XMTC has been focused improving prediction accuracy dealing with data. However, robustness deep learning based models against adversarial examples largely underexplored. In this paper, we investigate behaviour under attacks. To end, first, define attacks multilabel problems. We categorize attacking classifiers as (a) positive-to-negative, where target label should fall out top-k predicted (b) negative-to-positive, negative be among labels. Then, by experiments APLC-XLNet AttentionXML, show that are highly vulnerable to positive-to-negative but more robust negative-to-positive ones. Furthermore, our success rate an More precisely, tail classes for which attacker can generate samples high similarity actual data-points. overcome problem, explore effect rebalanced loss functions not only do they increase classes, also improve these The code available at https://github.com/xmc-aalto/adv-xmtc .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilabel Text Classification for Automated Tag Suggestion

The increased popularity of tagging during the last few years can be mainly attributed to its embracing by most of the recently thriving user-centric content publishing and management Web 2.0 applications. However, tagging systems have some limitations that have led researchers to develop methods that assist users in the tagging process, by automatically suggesting an appropriate set of tags. W...

متن کامل

Adversarial Extreme Multi-label Classification

The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels. Datasets in extreme classification exhibit a long tail of labels which have small number of positive training instances. In this work, we pose the learning task in extreme classification with large number of tail-...

متن کامل

Flexible Text Segmentation with Structured Multilabel Classification

Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguo...

متن کامل

Database-Text Alignment via Structured Multilabel Classification

This paper addresses the task of aligning a database with a corresponding text. The goal is to link individual database entries with sentences that verbalize the same information. By providing explicit semantics-to-text links, these alignments can aid the training of natural language generation and information extraction systems. Beyond these pragmatic benefits, the alignment problem is appeali...

متن کامل

Multilabel Associative Text Classification Using Summarization

This paper deals with the concern of curse of dimensionality in the Text Classification problem using Text Summarization. Classification and association rule mining can produce well-organized as well as precise classifiers than established techniques [1]. However, associative classification technique still suffers from the vast set of mined rules. Thus, this work brings in advantages of Automat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2022

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-022-06263-z